The UvA color document dataset
نویسندگان
چکیده
منابع مشابه
The IUPR Dataset of Camera-Captured Document Images
Major challenges in camera-base document analysis are dealing with uneven shadows, high degree of curl and perspective distortions. In CBDAR 2007, we introduced the first dataset (DFKI-I) of camera-captured document images in conjunction with a page dewarping contest. One of the main limitations of this dataset is that it contains images only from technical books with simple layouts and moderat...
متن کاملA Large Benchmark Dataset for Web Document Clustering
Targeting useful and relevant information on the WWW is a topical and highly complicated research area. A thriving research effort that feeds into this area is document clustering, which overlaps closely with areas usually known as text classification and text categorisation. A foundational aspect of such research (which has been proven over and over again in other research disciplines) is the ...
متن کاملTop-K Color Queries for Document Retrieval
In this paper we describe a new efficient (in fact optimal) data structure for the top-K color problem. Each element of an array A is assigned a color c with priority p(c). For a query range [a, b] and a value K, we have to report K colors with the highest priorities among all colors that occur in A[a..b], sorted in reverse order by their priorities. We show that such queries can be answered in...
متن کاملColor reduction for complex document images
A new technique for color reduction of complex document images is presented in this article. It reduces significantly the number of colors of the document image (less than 15 colors in most of the cases) so as to have solid characters and uniform local backgrounds. Therefore, this technique can be used as a preprocessing step by text information extraction applications. Specifically, using the ...
متن کاملDocument Processing for Automatic Color Form Dropout
Color dropout refers to the process of converting color form documents to black and white by removing the colors that are part of the blank form and maintaining only the information entered in the form. In this paper, no prior knowledge of the form type is assumed. Color dropout is performed by associating darker non-dropout colors with information that is entered in the form and needs to be pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Document Analysis and Recognition (IJDAR)
سال: 2005
ISSN: 1433-2833,1433-2825
DOI: 10.1007/s10032-004-0135-2